{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LtZEPBbW0A1p"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/app_deep_learning/blob/main/assignments/assignment_yourname_class4.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4AGR8dD60A1q"
   },
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).\n",
    "\n",
    "**Module 4 Assignment: Classification and Regression Neural Network**\n",
    "\n",
    "**Student Name: Your Name**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EjjWZ6js0A1r"
   },
   "source": [
    "# Assignment Instructions\n",
    "\n",
    "For this assignment, you will use the **crx.csv** dataset.  This dataset is a public dataset that can you can find [here](https://archive.ics.uci.edu/ml/datasets/credit+approval). You should use the CSV file on my data site, at this location: [crx.csv](https://data.heatonresearch.com/data/t81-558/crx.csv) because it includes column headers.  The primary use for this dataset is binary classification. There are 15 attributes, plus a target column that contains only + or -.  Some of the columns have missing values.\n",
    "\n",
    "You should train a neural network and return the predictions.  You will submit these predictions to the **submit** function.  See [Assignment #1](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/assignments/assignment_yourname_class1.ipynb) for details on how to submit an assignment or check that one was submitted.\n",
    "\n",
    "Complete the following tasks:\n",
    "\n",
    "* Your task is to replace missing values in columns *a2* and *a14* with values estimated by a neural network (one neural network for *a2* and another for *a14*).\n",
    "* Your submission file will contain the same headers as the source CSV: *a1*, *a2*, *s3*, *a4*, *a5*, *a6*, *a7*, *a8*, *a9*, *a10*, *a11*, *a12*, *a13*, *a14*, *a15*, and *a16*.\n",
    "* You should only need to modify *a2* and *a14*.\n",
    "* Neural networks can be much more powerful at filling missing variables than median and mean.\n",
    "* Train two neural networks to predict *a2* and *a14*.  \n",
    "* The *y* (target) for training the two nets will be *a2* and *a14*, depending on which you are trying to fill.\n",
    "* The x for training the two nets will be 's3','a8','a9','a10','a11','a12','a13','a15'.  These are chosen because it is important not to use any columns with missing values; also, it could cause unwanted bias if we include the ultimate target (*a16*).\n",
    "* ONLY predict new values for missing values in *a2* and *a14*.\n",
    "* You will likely get this small warning:  Warning: The mean of column a14 differs from the solution file by 0.20238937709643778. (might not matter if small)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kFOmkNMN0A1r"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "If you are using Google CoLab, it will be necessary to mount your GDrive so that you can send your notebook during the submit process. Running the following code will map your GDrive to ```/content/drive```."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "sXzFrMZc0A1r",
    "outputId": "76715f6c-3fb7-4b4e-e5cd-43014bc99d5f"
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    from google.colab import drive\n",
    "    drive.mount('/content/drive', force_remount=True)\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "# Early stopping (see module 3.4)\n",
    "import copy\n",
    "\n",
    "class EarlyStopping:\n",
    "    def __init__(self, patience=5, min_delta=0, restore_best_weights=True):\n",
    "        self.patience = patience\n",
    "        self.min_delta = min_delta\n",
    "        self.restore_best_weights = restore_best_weights\n",
    "        self.best_model = None\n",
    "        self.best_loss = None\n",
    "        self.counter = 0\n",
    "        self.status = \"\"\n",
    "\n",
    "    def __call__(self, model, val_loss):\n",
    "        if self.best_loss is None:\n",
    "            self.best_loss = val_loss\n",
    "            self.best_model = copy.deepcopy(model.state_dict())\n",
    "        elif self.best_loss - val_loss >= self.min_delta:\n",
    "            self.best_model = copy.deepcopy(model.state_dict())\n",
    "            self.best_loss = val_loss\n",
    "            self.counter = 0\n",
    "            self.status = f\"Improvement found, counter reset to {self.counter}\"\n",
    "        else:\n",
    "            self.counter += 1\n",
    "            self.status = f\"No improvement in the last {self.counter} epochs\"\n",
    "            if self.counter >= self.patience:\n",
    "                self.status = f\"Early stopping triggered after {self.counter} epochs.\"\n",
    "                if self.restore_best_weights:\n",
    "                    model.load_state_dict(self.best_model)\n",
    "                return True\n",
    "        return False\n",
    "\n",
    "# Make use of a GPU or MPS (Apple) if one is available.  (see module 3.2)\n",
    "import torch\n",
    "\n",
    "device = (\n",
    "    \"mps\"\n",
    "    if getattr(torch, \"has_mps\", False)\n",
    "    else \"cuda\"\n",
    "    if torch.cuda.is_available()\n",
    "    else \"cpu\"\n",
    ")\n",
    "print(f\"Using device: {device}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "N8l6b87y0A1t"
   },
   "source": [
    "# Assignment Submit Function\n",
    "\n",
    "You will submit the ten programming assignments electronically.  The following **submit** function can be used to do this.  My server will perform a basic check of each assignment and let you know if it sees any underlying problems.\n",
    "\n",
    "**It is unlikely that should need to modify this function.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9T6mAmM50A1t"
   },
   "outputs": [],
   "source": [
    "import base64\n",
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import requests\n",
    "import PIL\n",
    "import PIL.Image\n",
    "import io\n",
    "\n",
    "# This function submits an assignment.  You can submit an assignment as much as you like, only the final\n",
    "# submission counts.  The paramaters are as follows:\n",
    "# data - List of pandas dataframes or images.\n",
    "# key - Your student key that was emailed to you.\n",
    "# no - The assignment class number, should be 1 through 1.\n",
    "# source_file - The full path to your Python or IPYNB file.  This must have \"_class1\" as part of its name.\n",
    "# .             The number must match your assignment number.  For example \"_class2\" for class assignment #2.\n",
    "def submit(data,key,no,source_file=None):\n",
    "    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')\n",
    "    if source_file is None: source_file = __file__\n",
    "    suffix = '_class{}'.format(no)\n",
    "    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))\n",
    "    with open(source_file, \"rb\") as image_file:\n",
    "        encoded_python = base64.b64encode(image_file.read()).decode('ascii')\n",
    "    ext = os.path.splitext(source_file)[-1].lower()\n",
    "    if ext not in ['.ipynb','.py']: raise Exception(\"Source file is {} must be .py or .ipynb\".format(ext))\n",
    "    payload = []\n",
    "    for item in data:\n",
    "        if type(item) is PIL.Image.Image:\n",
    "            buffered = BytesIO()\n",
    "            item.save(buffered, format=\"PNG\")\n",
    "            payload.append({'PNG':base64.b64encode(buffered.getvalue()).decode('ascii')})\n",
    "        elif type(item) is pd.core.frame.DataFrame:\n",
    "            payload.append({'CSV':base64.b64encode(item.to_csv(index=False).encode('ascii')).decode(\"ascii\")})\n",
    "    r= requests.post(\"https://api.heatonresearch.com/assignment-submit\",\n",
    "        headers={'x-api-key':key}, json={ 'payload': payload,'assignment': no, 'ext':ext, 'py':encoded_python})\n",
    "    if r.status_code==200:\n",
    "        print(\"Success: {}\".format(r.text))\n",
    "    else: print(\"Failure: {}\".format(r.text))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "id": "iKv-IaUK0A1t"
   },
   "source": [
    "# Assignment #4 Sample Code\n",
    "\n",
    "The following code provides a starting point for this assignment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "mguSM-9C0A1t"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "from scipy.stats import zscore\n",
    "import pandas as pd\n",
    "import io\n",
    "import requests\n",
    "import numpy as np\n",
    "from sklearn import metrics\n",
    "\n",
    "# This is your student key that I emailed to you at the beginnning of the semester.\n",
    "key = \"H3B554uPhc3f8kirGGBYA7cYuDOamhXM87OY6QH1\"  # This is an example key and will not work.\n",
    "key = \"uTtH5yNbPs9tZdRWsBf9V9FaQA9RU2iP5cL7F3zH\"\n",
    "\n",
    "\n",
    "# You must also identify your source file.  (modify for your local setup)\n",
    "file='/content/drive/My Drive/Colab Notebooks/assignment_solution_class4.ipynb'  # Google CoLab\n",
    "# file='C:\\\\Users\\\\jeffh\\\\projects\\\\t81_558_deep_learning\\\\assignments\\\\assignment_yourname_class4.ipynb'  # Windows\n",
    "# file='/Users/jeff/projects/t81_558_deep_learning/assignments/assignment_yourname_class4.ipynb'  # Mac/Linux\n",
    "\n",
    "# Begin assignment\n",
    "df = pd.read_csv(\"https://data.heatonresearch.com/data/t81-558/crx.csv\",na_values=['?'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "w7nhJr_i0A1u",
    "outputId": "5e898135-a70c-4fb0-ecd3-634524045c4c"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from scipy.stats import zscore\n",
    "from sklearn.model_selection import train_test_split\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "from torch.utils.data import DataLoader, TensorDataset\n",
    "import tqdm\n",
    "from sklearn.preprocessing import LabelEncoder\n",
    "\n",
    "\n",
    "def fill_missing_numeric(df, current, target):\n",
    "    pass\n",
    "\n",
    "df_submit = fill_missing_numeric(df, 'a2', 'a16')\n",
    "df_submit = fill_missing_numeric(df_submit, 'a14', 'a16')\n",
    "\n",
    "submit(source_file=file,data=[df_submit],key=key,no=4)\n"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3.9 (torch)",
   "language": "python",
   "name": "pytorch"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
